Characterizing In-Text Citations Using N-Gram Distributions

نویسندگان

  • Marc Bertin
  • Iana Atanassova
چکیده

Introduction This article focuses on a Natural Language Processing (NLP) approach for the analysis of citation functions in scientific papers. Bibliometric studies traditionally rely on citation metadata and count the number of times a publication has been cited. However, some recent studies rely also on full text processing on papers, e.g. (Boyack et al., 2013), (Bertin et al., 2013, 2014). The full text content of papers and more specifically the sentences containing citations provide valuable information on the functions of citations that can be exploited through NLP. To study citation acts, we need to consider full text papers and their rhetorical structure. The main question that we want to answer here is whether the most frequent citation patterns are correlated to the rhetorical structure of scientific papers. We investigate the properties of the linguistic patterns that appear in citation contexts. For this, we study the distribution of n-gram classes containing verb forms, and we show the existence of three different types of distributions according to the rhetorical structure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Characterizing in-text citations in scientific articles: A large-scale analysis

We report characteristics of in-text citations in over five million full text articles from two large databases – the PubMed Central Open Access subset and Elsevier journals – as functions of time, textual progression, and scientific field. The purpose of this study is to understand the characteristics of in-text citations in a detailed way prior to pursuing other studies focused on answering m...

متن کامل

Citations in the Digital Library of Classics: Extracting Canonical References by Using Conditional Random Fields

Scholars of Classics cite ancient texts by using abridged citations called canonical references. In the scholarly digital library, canonical references create a complex textile of links between ancient and modern sources reflecting the deep hypertextual nature of texts in this field. This paper aims to demonstrate the suitability of Conditional Random Fields (CRF) for extracting this particular...

متن کامل

بازشناسی متون فارسی با استفاده از مدل زبانی n-gram و پالایش گرامری

Abstract Text recognition has been one of the growing research topics in recent years. Many of these researches have focused on recognition of letters and sub-words as a basis for identifying larger text structures such as words, phrases and sentences. This thesis presents a new method in which the recognized sub-words are combined in order to provide meaningful words and sentences in Farsi tex...

متن کامل

Natural Language Generation for Text-to-Text Applications Using an Information-Slim Representation

I propose a representation formalism and algorithms to be used in a new language generation mechanism for text-to-text applications. The generation process is driven by both text-specific information encoded via probability distributions over words and phrases derived from the input text, and general language knowledge captured by n-gram and syntactic language models. A Text-to-Text Perspective...

متن کامل

Sentiment analysis of scientific citations

Some figures in this document are best viewed in colour. If you received a black-and-white copy, please consult the online version if necessary. Summary While there has been growing interest in the field of sentiment analysis for different text genres in the past few years, relatively less emphasis has been placed on extraction of opinions from scientific literature, more specifically, citation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015